Prediction of protein subcellular locations using fuzzy k-NN method
نویسندگان
چکیده
MOTIVATION Protein localization data are a valuable information resource helpful in elucidating protein functions. It is highly desirable to predict a protein's subcellular locations automatically from its sequence. RESULTS In this paper, fuzzy k-nearest neighbors (k-NN) algorithm has been introduced to predict proteins' subcellular locations from their dipeptide composition. The prediction is performed with a new data set derived from version 41.0 SWISS-PROT databank, the overall predictive accuracy about 80% has been achieved in a jackknife test. The result demonstrates the applicability of this relative simple method and possible improvement of prediction accuracy for the protein subcellular locations. We also applied this method to annotate six entirely sequenced proteomes, namely Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster, Oryza sativa, Arabidopsis thaliana and a subset of all human proteins. AVAILABILITY Supplementary information and subcellular location annotations for eukaryotes are available at http://166.111.30.65/hying/fuzzy_loc.htm
منابع مشابه
Prediction of protein subcellular locations using Markov chain models.
A novel method was introduced to predict protein subcellular locations from sequences. Using sequence data, this method achieved a prediction accuracy higher than previous methods based on the amino acid composition. For three subcellular locations in a prokaryotic organism, the overall prediction accuracy reached 89.1%. For eukaryotic proteins, prediction accuracies of 73.0% and 78.7% were att...
متن کاملAccurate prediction of enzyme subfamily class using an adaptive fuzzy k-nearest neighbor method
Amphiphilic pseudo-amino acid composition (Am-Pse-AAC) with extra sequence-order information is a useful feature for representing enzymes. This study first utilizes the k-nearest neighbor (k-NN) rule to analyze the distribution of enzymes in the Am-Pse-AAC feature space. This analysis indicates the distributions of multiple classes of enzymes are highly overlapped. To cope with the overlap prob...
متن کاملPrediction of protein subcellular multisite localization using a new feature extraction method.
A basic problem of proteomics is identifying the subcellular locations of a protein. One factor making the problem more complicated is that some proteins may simultaneously exist in two or more than two subcellular locations. To improve multisite prediction quality, it is necessary to use effective feature extraction methods. Here, we developed a new feature extraction method based on the pK va...
متن کاملPrediction of protein subcellular locations by support vector machines using compositions of amino acids and amino acid pairs
MOTIVATION The subcellular location of a protein is closely correlated to its function. Thus, computational prediction of subcellular locations from the amino acid sequence information would help annotation and functional prediction of protein coding genes in complete genomes. We have developed a method based on support vector machines (SVMs). RESULTS We considered 12 subcellular locations in...
متن کاملمقایسه عملکرد مدل کاکس و روش K ـ نزدیکترین همسایگی در تخمین بقای بیماران پیوند کلیه
Introduction & Objective: Cox model is a common method to estimate survival and validity of the results is dependent on the proportional hazards assumption. K- Nearest neighbor is a nonparametric method for survival probability in heterogeneous communities. The purpose of this study was to compare the performance of k- nearest neighbor method (K-NN) with Cox model. Materials & Methods: This ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Bioinformatics
دوره 20 1 شماره
صفحات -
تاریخ انتشار 2004